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1.   Introduction 

In  numerical  weather  prediction,  objective  analysis  is  the 
process  of  combining  information  obtained  from  observations  of 
meteorological  variables  with  that  from  the  numerical  prediction 
process.   The  resulting  "analyzed"  values  are  used  to  prepare 
weather  maps,  as  well  as  to  initialize  the  variables  for  the  next 
weather  prediction  cycle.   The  problem  is  inherently  a  multivar- 
iate one  since  the  variables  are  not  independent,  e.g.,  pressure 
heights  are  related  to  winds.   The  predicted  values  are  on  a 
regular  grid,  and  have  errors  which  are  spatially  correlated. 
The  observed  values  are  measured  imperfectly,  and  occur  at  irreg- 
ularly spaced  (scattered)  points  (both  in  space  and  time).   The 
errors  in  the  observations  sometimes  occur  independently,  with 
zero  mean,  and  in  other  cases,  such  as  satellite  observations, 
are  biased  with  correlated  errors. 

The  traditional  approach  to  the  problem  is  a  two  step 
process.   The  predicted  values  are  treated  as  a  first-guess  and 
interpolated  from  the  grid  to  the  observation  points.   The  dif- 
ference between  the  first-guess  values  interpolated  to  the  obser- 
vation points  and  the  observed  values,  called  the  first-guess 
error,  is  then  interpolated  back  to  the  grid  points  as  a  correc- 
tion to  the  first-guess  values.   The  interpolation  from  grid-to- 
observation  points  is  the  "easy"  process,  and  has  not  received 
much  attention  in  the  literature.   The  procedure  generally  used 
is  multilinear  interpolation  (e.g.,  Bergman,  1979,  or  Lorenc, 
1931),  although  recent  investigations  by  the  author  (Franke, 
1935)  have  demonstrated  that  appreciable  error  may  occur  in  this 


step.   The  interpolation  from  observation-to-grid  points  is  the 
"hard"  problem  and  has  received  widespread  attention.   Histor- 
ically the  favored  scheme  has  been  a  weighted  average  scheme, 
originally  introduced  by  Cressman  (1959),  with  a  variation  due  to 
Barnes  (1973).   Currently  the  method  of  choice  is  a  statistical 
scheme  known  in  the  meteorological  literature  as  Optimum  Interp- 
olation (01),  and  in  other  disciplines  by  other  names  (e.g., 
Kriging  in  the  mining  and  geology  literature). 

The  interpolation  process  known  as  01  has  its  roots  in  the 
work  of  Weiner  and  Kolmogorov,  and  was  introduced  to  the  meteoro- 
logical literature  by  Gandin  (1963).   The  theory  of  the  process 
depends  on  it  being  applied  to  a  random  function  with  known 
spatial  statistics.   In  particular  it  is  assumed  that  the  spatial 
covariance  structure  of  the  class  of  functions  to  which  it  is 
applied  is  known.   In  addition  it  is  necessary  to  know  the  error 
statistics  of  the  observation  devices.   If  this  is  the  case,  then 
the  process  yields  the  best  answer  possible  in  the  sense  that  the 
variance  of  the  error  is  minimized  over  all  functions  in  the 
class.   For  meteorological  purposes,  this  means  the  covariance 
structure  of  an  ensemble  of  realizations  must  be  known,  and  then 
the  mean  squared  error  over  the  entire  ensemble  is  minimized. 
Using  standard  least  squares  methods,  the  variance  of  the  expect- 
ed error  is  easily  computed,  and  much  emphasis  has  been  put  on 
this  as  an  advantage  of  the  method. 

There  have  been  numerous  papers  about  the  multivariate  ap- 
plication of  01  to  the  objective  analysis  problem.   These  are  of 
an  applications  nature,  and  it  is  difficult  to  separate  the 
behavior  of  such  schemes  from  that  of  the  other  involved 


processes.   In  studies  of  objective  analysis  using  simulated  data 
to  attempt  to  learn  something  about  the  properties  of  the  scheme, 
many  simplifications  are  required.   This  study  is  no  different. 
The  univariate  (only  one  meteorological  variable  is  treated,  in 
this  case  the  500  mb  pressure  height  surface)  application  of  01 
and  other  schemes  is  investigated.   Because  the  generation  of 
simulated  data  with  specified  spatial  correlation  properties 
requires  the  factorization  of  the  correlation  matrix  for  the 
first  guess  error  at  the  grid  points,  it  is  necessary  to  work 
with  a  relatively  small  grid.   Further,  the  problem  of  non- 
synoptic  observation  of  variables  is  not  treated,  rather  all 
observations  are  assumed  made  at  the  same  time,  the  time  at  which 
the  particular  realization  occurs.   V/ithin  the  prescribed  limita- 
tions, the  procedure  used  is  valid  and  yields  information  about 
the  objective  analysis  process  which  should  prove  to  be  useful  in 
practice . 

A  somewhat  different  way  of  looking  at  the  problem  was 
proposed  by  Wahba  and  Wendelberger  (1980).   See  also  Wendelberger 
(1931).   In  their  work,  no  first  guess  was  necessary  or  assumed; 
all  data  was  considered  to  be  observation  values.   Thus  the 
underlying  field  to  be  approximated  was  treated  directly,  rather 
than  making  a  correction  to  the  first-guess  field.   The  overall 
process  involved  the  use  of  Laplacian  smoothing  splines  and 
generalized  cross  validation  to  determine  a  suitable  value  for 
the  smoothing  parameter.   If  a  first  guess  is  available,  with 
known  correlated  errors,  then  ignoring  this  information  is  prob- 
ably unwise.   The  first-guess  can  be  used  in  the  traditional 
manner,  with  the  Laplacian  smoothing  splines  applied  to  the 


first-guess  error.   It  is  also  possible  to  apply  the  Laplacian 
smoothing  splines  to  all  of  the  data.   Thus,  part  of  the  invest- 
igation reported  here  involved  the  use  of  Laplacian  smoothing 
splines  and  generalized  cross  validation  for  the  smoothing  par- 
ameter in  a  scheme  that  approximates  the  underlying  field  direct- 
ly, but  that  also  makes  use  of  all  available  data  in  a  way  that 
accounts  for  the  correlation  of  the  errors.   The  program  used  was 
a  modified  version  of  the  program  MSSP,  available  from  the 
Madison  Academic  Computing  Center,  University  of  Wisconsin. 
Section  2  gives  an  outline  of  the  goals  of  this  study, 
background  information  about  the  methods  of  objective  analysis 
considered,  and  aspects  of  the  schemes  investigated.   The  results 
of  the  study  are  given  and  discussed  in  Section  3.   Finally,  the 
implications  of  the  results  and  conclusions  about  approaches  to 
objective  analysis,  and  suggestions  for  further  study  are  given 
in  Section  4. 

2.   Goals  of  the  study 

This  study  had  two  principal  goals:   (1)   To  investigate  the 
efficacy  of  generalized  cross  validation  (GCV)  in  determining  the 
smoothing  parameter  used  in  Laplacian  smoothing  splines  (LSS), 
and  (2)   To  test  the  possibility  of  treating  first-guess  values 
and  observed  values  in  a  unified  method  with  LSS.   The  smoothing 
parameter  value  must  be  given  in  order  to  use  LSS,  and  Wahba  and 
Wendelberger  (198i3)  have  indicated  that  GCV  might  be  a  good  way 
to  choose  the  value.   In  this  study  I  performed  simulations  to 
determine  if  GCV  could  adapt  properly  to  particular  realizations 
in  an  ensemble  with  specified  error  statistics. 


The  advantage  of  a  unified  scheme  for  both  first-guess  and 
observed  values  is  that  it  potentially  makes  it  possible  to 
obtain  better  analyses  where  the  observations  are  sparse  compared 
to  the  grid  or  correlation  distances.   The  LSS  method  used  in 
this  investigation  was  the  scheme  proposed  by  Wahba  and 
Wendelberger  (1980),  which  is  described  more  fully  in 
Wendelberger  (1981,  1982).   The  general  framework  of  this  study 
follows  that  of  a  previous  investigation  (Franke,  1985). 

A  brief  description  of  the  setting  in  which  the  numerical 
experiments  were  performed  follows.   An  underlying  function  to  be 
approximated  was  chosen.   The  simulated  pressure  height  field 
described  by  Koehler  (1979)  was  used,  at  the  500  mb  level,  with 
random  values  for  two  parameters,  0Q  (chosen  uniformly  distri- 
buted on  [-112. 5°, -82. 5°]),  and  A  9  (chosen  uniformly  distributed 
on  [-15°, 15°]).   One  possible  realization  of  the  field  is  shown 
in  Figure  4.   The  underlying  field  was  then  evaluated  on  a  rec- 
tangular grid.   Normally  distributed  first-guess  errors  with 
specified  spatial  covariance  were  generated  and  added  to  the 
field  values  to  obtain  the  first-guess  values.   Then,  the  under- 
lying field  was  evaluated  at  a  set  of  observation  points,  and 
normally  distributed  independent  observation  errors  with  speci- 
fied variance  were  added  to  these  values  to  obtain  observation 
values.   An  objective  analysis  scheme  was  then  applied  using  the 
first-guess  and  observation  values  to  obtain  estimates  of  the 
underlying  field  at  the  grid  points;   these  are  called  the  ana- 
lyzed values  of  the  field.   The  errors  in  the  analyzed  values 
were  then  computed.   After  repeating  the  process  for  many  reali- 
zations, estimates  of  the  root-mean-square  error  was  obtained. 


In  order  to  avoid  edge  effects,  rms  errors  for  the  first-guess 
and  analyzed  values  were  tabulated  only  over  the  interior  grid 
points.   In  a  previous  study  (Franke,  1985),  this  process  was 
used  to  obtain  simulated  results  using  various  objective  analysis 
schemes,  under  various  assumptions  about  parameters  in  statisti- 
cal schemes  and  other  methods.   In  the  current  study  this  process 
is  the  starting  point  for  investigations  indicated  above. 

The  approach  taken  for  01  is  to  view  the  approximation  as  a 
linear  combination  of  the  spatial  covariance  functions  for  the 
observation  points, 

Ng 

F(P)  =  X.  akC(P,Pk) 
k=l   K     K 

Here  C(P,Q)  is  the  stationary,  isotropic  covariance  function  for 

the  first-guess  error,  F(P)  is  the  approximating  function,  the 

observation  points  are  P^,  with  first  guess  values  F-^, 

k=l,...,N0,   and  the  a^  satisfy  the  system  of  equations 
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The  r^  denote  the  standard  deviation  of  the  observation  errors  at 

pk' 

One  of  the  practical  difficulties  of  the  method  is  the 

specification  of  a  suitable  covariance  structure.   Not  only  is 
this  important  from  the  standpoint  of  modeling  the  process  prop- 
erly, but  also  from  the  standpoint  of  obtaining  meaningful 
estimates  of  the  mean  squared  error.   In  fact,  these  estimates 


hold  only  when  the  covariance  structure  is  known.   When  a  parti- 
cular structure  is  assumed,  with  parameter  values  being  estimated 
from  a  time  history  or  otherwise  specified  inexactly,  these 
estimates  may  differ  substantially  from  the  actual  values 
(Franke,  1985).   As  a  matter  of  terminology,  it  is  noted  that 
when  the  process  is  applied  using  empirically  derived,  or  assumed 
covariance  functions,  the  scheme  is  called  "statistical  inter- 
polation" in  the  meteorology  literature.   It  is  easily  observed 
that  the  accuracy  of  the  scheme  is  closely  related  to  a  somewhat 
nebulous  quantity  which  I  will  refer  to  as  the  "correlation 
distance".   This  quantity  indicates  something  about  the  distance 
at  which  the  spatial  correlation  in  the  first-guess  values  drops 
below  a  certain  level.   If  the  distance  from  observation  points 
to  the  analysis  point  (a  grid  point,  in  this  case)  is  greater 
than  the  correlation  distance,  then  the  scheme  cannot  perform 
well,  and  in  fact  may  only  improve  the  value  slightly.   Thus  the 
performance  of  the  method  is  strongly  dependent  on  the  first- 
guess  errors  being  correlated,  the  the  higher  the  correlation, 
the  better. 

The  scheme  proposed  by  Wahba  and  Wendelberger  (198;J)  is 
based  on  the  use  of  LSS.   These  functions  were  first  introduced 
as  interpolation  functions  by  Harder  and  Desmarais  (1972),  and 
were  later  developed  more  fully  by  Duchon  (1976,  1977)  and 
Meinguet  (1979,  1979a).   The  generalization  to  smoothing  and 
their  application  to  the  objective  analysis  problem  was  by  Wahba 
and  Wendelberger.   The  functions  obtain  their  name,  and  are 
characterized  by  minimization  of  a  functional  related  to  the 
iterated  Laplacian,   A  m; 


L  (i)l/2(&)2dA+N°"1(iH,te"1(AH)' 
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+j=m   •"  r' 
Here   X  is  a  smoothing  parameter,  and  the  order  of  the  LSS  m  (>1) 

determines  the  smoothness  of  the  function  in  terms  of  the  number- 

of  continuous  derivatives  it  must  have.   The  NQ  vector  AH  is 

differences  between  the  approximation  values  and  the  data  values, 

and  £  is  the  covariance  matrix  between  the  errors  in  the  data 

taken  over  an  ensemble  of  realizations.   In  the  context  of  the 

objective  analysis  problem  being  considered,  the  solution  of  the 

problem  can  be  shown  to  be  a  function  of  the  form 

No  -  • 

H(P)  =L  Ak  B(P,Pk)  +1   b..   0  *J  , 
k=l  i+j^m 

where  the  independent  variables  are  taken  to  be  longitude,  9,  and 
latitude,  6,  and  the  data  points  are  (Pj^H^)  with  P^  =  ( 9-^ , 6k ) , 
k=l,...,NQ.   The  basis  functions  for  the  approximation  depend  on 
the  number  of  independent  variables.   For  the  case  of  two  inde- 
pendent variables  the  basis  functions  can  be  taken  to  be  B(P,Q)  = 
I  I  P-Ql  I  2rn~  log  I  I  P-Ql  I  ,  where  II  P-Q  I  I  is  the  distance  in  degrees 
between  points  P  and  Q.   The  coefficients  A^,  and  those  of  the 
polynomial   ]T  b^  ^916^  ,  satisfy  the  system  of  equations 
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Before  the  system  of  equations  can  be  solved  for  the  coef- 
ficients the  smoothing  parameter  A  must  be  specified.   Wahba  and 
Wendelberger  (1980)  show  how  to  choose  this  smoothing  parameter 
using  GCV.   In  simple  cross  validation  X  is  selected  to   mini- 
mize the  square  of  the  errors  in  the  scheme  measured  by  sequen- 
tially predicting  the  value  at  each  data  point  when  it  is  omitted 
from  the  set,  then  summing  over  all  data  points.   This  turns  out 
to  be  an  unreasonably  expensive  calculation,  and  GCV  is  a  proce- 
dure for  estimating  the  minimizing  parameter  in  the  particular 
realization. 

Since  Optimum  Interpolation  (01)  is  typically  used  in  met- 
eorological analysis,  the  performance  of  LSS  and  GCV  was  measured 
relative  to  that  of  01.   In  the  ensemble  mean-squared  error 
sense,  01  must  perform  at  least  as  well  as  any  other  scheme  based 
on  making  corrections  to  a  first-guess  field.   As  was  shown  in 
Franke  (1985),  the  simulation  program  yields  rms  errors  which 
compare  very  favorably  with  the  predicted  values  from  the  scheme, 
so  it  was  not  necessary  to  run  the  simulations  for  01.   The 
simulations  for  01  are  quite  inexpensive  to  compute,  however,  and 
some  were'run  as  a  check  of  the  simulation  program.   In  all  the 
simulations,  the  spatial  covariance  of  the  first-guess  errors 
were  assumed  to  be  Gaussian, 

C(P,Q)  =  r2exp(-( I |P-Q| |/cd)2). 
Here  c^  is  a  parameter,  referred  to  in  the  sequel  as  the  correla- 
tion  distance,  and  r^  is  the  variance  of  the  first-guess  error. 
The  use  of  Gaussian  correlation  functions  and  distance  in  degrees 
is  not  necessarily  the  best  assumption  that  could  be  made.   For 
example,  recent  work  by  Thiebaux  (1985)  has  shown  that  autore- 


gressive  correlation  models  approximate  the  actual  first-guess 
error  data  better  than  Gaussian  functions,  and  have  other  re- 
quired properties  needed  in  the  multivariate  case,  as  well. 
However,  the  overall  results  of  this  study  would  probably  be 
altered  only  slightly  by  use  of  other  correlation  functions  and 
distance  in  kilometers. 

3.   Results  of  the  study 

A  number  of  simulations  with  different  grid  and  observation 
point  sets,  correlation  distances,  and  order  of  the  LSS,  were 
computed.   A  table  giving  the  parameters  of  most  of  these  simula- 
tions, including  the  resulting  rms  errors  of  the  estimated  grid 
point  values  is  given  in  Table  1.   All  simulations  used  values  of 
30  m  and  10  m  for  the  standard  deviations  of  the  first-guess  and 
observation  errors,  respectively. 

The  efficacy  of  the  generalized  cross  validation  (GCV)  pro- 
cess as  a  scheme  for  choosing  the  smoothing  parameter  was  one  of 
the  primary  points  investigated.   An  attempt  was  made  to  deter- 
mine if  the  rms  error  resulting  from  the  choice  of  smoothing 
parameter  by  GCV  were  related  to  any  other  parameters  in  the 
particular  realization.   With  given  parameters,  a  set  of  50  (or 
100,  for  the  3x6  and  5x5  grids)  realizations  were  generated,  and 
the  rms  errors  of  the  analyzed  values  at  the  grid  points,  along 
with  the  rms  first-guess  and  observation  errors,  the  smoothing 
parameter  value,  and  the  GCV  function  value  were  tabulated.   The 
realizations  were  then  repeated  using  a  smoothing  parameter  value 
determined  from  an  eye-ball  average  of  the  \     values  obtained 
through  GCV  over  all  realizations  in  the  particular  ensemble. 
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Table  1  shows  the  results  of  LSS  simulations  with  several 
different  processes.   The  objective  analysis  processes  used  were 
(1)  the  Wahba  and  Wendelberger  method  with  no  first-guess  and  cd 
=  10°,   (2)  the  correction  to  first-guess  method  with  LSS  applied 
to  the  first-guess  error  and  cd  =  10°,  and  (3)  the  unified  scheme 
with  various  correlation  distances  specified.   For  the  13x9  and 
3x6  grids,  the  three  values,  m=2,  3,  and  4  were  used.   Wahba  and 
Wendelberger  had  previously  reported  that  for  similar  data,  m=4 
or  5  seemed  to  be  appropriate.   The  simulations  performed  here 
indicate  that  for  the  particular  underlying  function  used,  m=3  or 
4  is  best.   Though  not  discernable  from  the  table,  the  GCV  func- 
tion generally  was  found  to  have  multiple  local  minima,  especial- 
ly for  the  larger  data  sets  when  the  first-guess  errors  were 
highly  correlated  (large  c^).   The  results  for  m=4  and  cd  =  10° 
are  not  completely  reliable  since  three  of  the  cases  failed  in 
the  determination  of  the  smoothing  parameter  using  GCV,  and  five 
others  gave  very  poor  results.   The  failures  were  probably  caused 
by  inexact  computations  of  the  square  root  of  the  correlation 
matrix  in  the  LSS  program,  because  the  correlation  matrix  is 
poorly  conditioned  with  respect  to  the  precision  used  in  the 
computations  (double  precision  (REAL*8)  on  an  IBM  computer).   :io 
failures  occurred  when  the  smoothing  parameter  was  specified. 

The  principal  results  to  be  drawn  from  Table  1  are:   making 
corrections  to  the  first-guess  field  always  gave  better  analyzed 
values  than  not  using  the  first-guess  field,  the  unified  scheme 
(without  GCV)  gave  better  analyzed  values  than  the  no  first-guess 
process,  and  decreasing  the  correlation  distance  results  in  some 
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variation  of  the  rms  errors  in  the  analyzed  values  but  the  errors 
do  not  tend  to  increase  greatly  as  correlation  distance  is  de- 
creased.  The  last  result  is  discussed  in  more  detail  later. 

In  addition  to  the  tabulation  shown,  a  number  of  plots  of 
various  parameters  versus  rms  error  in  the  analyzed  values  for 
some  of  the  sets  of  GCV  realizations  were  made.   Some  of  those 
are  reproduced  here,  showing  a  typical  range  of  behavior.   In 
Figures  5-10,  the  simulations  were  on  the  13x9  grid  (Figure  1), 
with  a  correlation  distance  of  c^  =  7.5°,  and  smoothness  parame- 
ter m  =  4.   One  point  is  off  the  graph  area  and  its  projection 
onto  the  boundary  is  shown.   Figure  5  shows  the  rms  errors  of  the 
analyzed  values  for  the  non-GCV  simulations  versus  the  rms  errors 
of  the  analyzed  values  for  the  corresponding  GCV  simulations. 
These  appear  to  be  correlated  fairly  well.   The  total  rms  error 
is  smaller  for  analyses  using  a  specified  smoothing  parameter 
value  than  for  those  obtained  using  GCV.   Figures  5-10  show 
scatter  diagrams  of  first-guess  rms  error,  observation  error, 
ratio  of  rms  first-guess  to  rms  observation  error,  log  A  ,  and  GCV 
function  value,  respectively,  versus  the  rms  error  in  the  ana- 
lyzed values  obtained  with  GCV.   No  correlation  between  these 
sets  of  values  is  apparent,  and  in  particular  the  GCV  function 
value  does  not  seem  to  be  correlated  with  the  actual  rms  errors 
in  the  analyzed  values.   Thus  it  would  appear  that  while  compu- 
ting the  GCV  function  gives  one  something  to  minimize,  in  this 
problem  it  is  not  true  that  the  minimum  of  it  corresponds  to  a 
minimum  in  the  rms  error  of  the  analyzed  values.   This  is  further 
borne  out  by  the  generally  smaller  errors  are  obtained  by  speci- 
fying a  constant  value  for  the  smoothing  parameter.   The  other 
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parameter  which  might  possibly  be  related  to  the  minimum  value  of 
the  function  is  the  value  of  the  smoothing  parameter,  however 
Figure  9  again  shows  no  particular  evidence  of  correlation.   None 
of  the  available  parameters  seem  to  be  indicative  of  "extreme" 
cases,  and  in  particular  are  not  detectable  either  from  the  GCV 
value  or  the  smoothing  parameter  value.   The  one  exception  to 
that  is  the  extreme  point  which  is  shown  on  the  boundary,  which 
does  correspond  to  a  very  small  value  of  the  smoothing  parameter, 
A.   This  implies  that  little  smoothing  was  applied  for  this 
particular  realization.   The  case  also  corresponds  to  a  relative- 
ly small  value  of  the  ratio  of  rms  first-guess  error  to  rms 
observation  error. 

Figures  11-16  show  the  corresponding  plots  for  realizations 
incorporating  uncorrelated  first-guess  error  (correlation  dist- 
ance Cj  =  0),  again  for  the  13x9  grid.   Except  for  there  being  no 
cases  giving  really  poor  performance  of  GCV  in  these  realiza- 
tions, the  behavior  ,is  basically  the  same  as  Figures  5-10.   The 
only  evidence  of  correlated  values  is  between  the  rms  errors  of 
the  analyzed  values  with  and  without  GCV.   Other  plots  for  varia- 
tions in  correlation  distances,  smoothness  parameters,  grids  and 
observation  point  sets  support  these  results. 

The  relative  constancy  of  the  the  rms  errors  obtained  by  the 
LSS  as  the  correlation  distance  is  varied,  as  opposed  to  the 
rapid  increase  in  errors  obtained  by  01  as  the  correlation  dist- 
ance decreases  is  thought  provoking.   One  is  easily  convinced 
that  since  01  is  based  on  the  idea  of  a  correction  to  first-guess 
errors  and  since  the  successful  application  of  01  depends  on 
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correlated  first-guess  errors  (the  more  strongly  correlated,  the 
better),  01  cannot  be  very  successful  in  regions  where  the  dist- 
ance between  observation  points  is  a  significant  fraction  of  the 
correlation  distance  (or  perhaps,  where  the  density  of 
observation  points  per  unit  of  correlation  distance  area  is 
small).   This  behavior  is  seen  in  the  last  column  of  Table  2, 
which  also  summarizes  the  results  for  m=4  on  the  3  grids  used  in 
the  simulations.   Observe  that  no  correction  can  be  expected  to 
be  made  if  the  first-guess  errors  are  uncorrelated  (c^  =  0).   The 
relationship  is  complex,  as  is  seen  through  the  inversion  of  the 
system  of  equaions  for  the  coefficients  in  the  approximation,  and 
could  be  expected  to  depend  heavily  on  distances  to  several 
nearby  observation  points  as  well  as  the  first-guess  grid  size. 

The  phenomenon  is  more  clearly  illustrated  by  Figures  17-19, 
which  graphically  shows  some  of  the  data  of  Table  2.  Figure  17 
shows  the  rms  errors  as  a  function  of  correlation  distance  for 
unified  LSS.  for  m=4  with  and  without  GCV,  and  for  01  from  simula- 
tions, along  with  the  expected  rms  error  from  01.  Figures  19  and 
19  show  the  corresponding  data  for  the  8x6  grid  and  the  5x5  grid, 
again  with  m=4. 

4.   Conclusions 

This  investigation  has  been  primarily  concerned  with  the 
performance  of  generalized  cross  validation  in  conjunction  with 
its  use  to  determine  the  smoothing  parameter  for  Laplacian 
smoothing  splines  applied  to  the  objective  analysis  problem  in 
numerical  weather  prediction.   While  the  simulations  performed 
have  been  within  that  context,  I  feel  that  the  results  have 
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general  applicability  and  would  be  similar,  independent  of  the 
source  of  the  data.   Nonetheless,  the  results  will  be  discussed 
in  terms  of  the  setting  in  which  they  were  performed.   It  is 
apparent  that  the  routine,  day-to-day  use  of  GCV  is  not  a  suit- 
able, nor  cost  effective  way,  to  determine  the  smoothing  param- 
eter for  LSS.   On  the  other  hand,  it  does  seem  to  be  useful  to 
determine  a  single  suitable  value  to  use  for  all  realizations  in 
some  particular  ensemble.   No  effort  was  made  to  determine  the 
optimum  value  of  X  to  use  for  any  set  of  realizations  in  our 
simulation,  although  in  some  cases  a  set  was  run  with  more  than 
one  value  of  A .   The  results  indicated  that  the  "eye-ball"  av- 
erage used  was  a  good  value,  although  it  could  be  improved  on  if 
there  is  access  to  the  actual  errors.   In  practice,  of  course, 
this  is  not  the  case. 

The  use  of  LSS  in  a  unified  sense  to  treat  both  first-guess 
and  observations  in  the  same  manner  looks  promising  in  regions 
wnere  the  observations  are  sparse.   My  investigation  here  is  not 
really  complete,  however,  and  some  additional  work  is  necessary 
to  verify  the  apparent  conclusion  that  can  be  made.   In  partic- 
ular, the  simulations  had  perfect  knowledge  of  the  statistical 
characteristics  of  both  the  first-guess  and  observation  error, 
and  in  practice  this  is  impossible.   An  investigation  of  the 
sensitivity  of  both  statistical  interpolation  and  LSS  to  erro- 
neous specification  of  the  statistical  characteristics  of  the 
errors  is  planned.   In  addition  to  this,  several  sets  of  grids 
with  sparse  observations  will  be  used  in  the  study.   For  statis- 
tical interpolation  it  is  possible  to  find  the  rms  errors  over  a 
given  ensemble  of  realizations  without  simulation  (see  Seaman, 
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1983).      This   may  be  possible   for   LSS   when   a    smoothing   parameter 
is    specified.       While   the   simulation  program   is   available   and 
gives   very   good    results    (as    can   be    seen    in    Figures    17-19    for    01), 
Seaman's    approach    requires    considerably    less    computation. 

All    simulations    reported   on  here    were    univariate.      In    its 
current   practical    applications   01    is    applied    in  a   multivariate 
setting.      As    noted   by   Wahba   and   Wendelberger,    LSS    is    also   appli- 
cable   in   the   multivariate    setting,    but   the   method  has   not   been 
rigorously   tested,     since   they   computed   only  a   small    number   of 
examples.       There    is    no    reason   to   suspect   that   LSS   will   perform 
any    less    well,    compared    to   01,    in    this    setting    than    it    does    in 
the    univariate    case.       It    is    necessary   to   perform    some   comparable 
analyses  for  the  two  methods  to  verify  this,  however,  and  such  a 
study    is    anticipated    in    the    near    future. 
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Type  Lambda      rms  error   rms  error   rms  error 

(Ab=Axl0b)      m=2       m  =  3       m=4 


No  first-guess 
ea  =  10° 

GCV 
5~3,5-2,2_1 

8.65 
8.55 

7.63 
7.44 

8.42 
7.63 

Corrections  to 
first-guess 
cd  =  10° 

GCY    ,   „ 
3-2, 1-1, i« 

7.16 
6.77 

7.44 
6.82 

7.63 
7.18 

Unified 
cd  = 

10° 

GCY 

l-^2-4,4"3 

18.44 
6.64 

25.70 
6.37 

136.42 
6.37 

Unified 

cd  ■ 

7.5° 

JS 

6.94 
6.60 

Unified 
cd  = 

5° 

25"b,25-b#25' 

-4 

8.82 
8.47 

7.05 
6.84 

6.50 
6.48 

Unified 

cd  ■ 

0° 

GCV 
25-6,25-5,25' 

-4 

8.43 
7.09 

6.15 
5.94 

5.95 
5.75 

Table  1A:   rms  errors  of  the  analyzed  values  for  GCV  and  non-GCV 
simulations  on  the  13x9  grid  with  36  observation  locations. 
Specified  error  parameters  were  rq  =  30,  rQ  =  10. 


Type 

Lambda 
(Ab=Axl0b) 

rms  error 

m  =  2 

rms  error 
m  =  3 

rms  error 
m  =  4 

No  first-guess 
cd  =  10° 

GCY 

1^.1-1.1-1 

10.10 

9.80 

9.75 
8.79 

10.86 
9.85 

Corrections 
first-guess 

to 

GCV 

1-1. 5-1. I0 

7.00 
6.48 

7.58 
6.60 

10.33 
3.25 

Unified 

cd  ■ 

10° 

GCV 

2-5, 3~4, 8-3 

9.67 
6.28 

9.60 
6.17 

9.42 
6.25 

Unified 
cd  = 

5° 

Gey 

5-5,4-4,25"4 

10.83 
8.58 

8.09 
7.43 

7  .91 
7.07 

Unified 
cd  = 

0° 

Gcy 

4-5,4"4,4-3 

9.22 
7.52 

8.11 
6.87 

8.23 
6.95 

Table  IB:   rms  errors  of  the  analyzed  values  for  GCV  and  non-GCV 
simulations  on  the  8x6  grid  with  16  observation  points. 
Specified  error  parameters  were  rg  =  30,  r   =  10. 
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Type  Lambda      rms  error   rms  error   rms  error 

(Ab=Axl0b)      m=2       m=3       in  =4 


Unified 

cd  " 

10° 

GCV* 
25~6,25~5,25~4 

13.94 
8.76 

12.55 

8.64 

10.80 
8.63 

Unified 
cd  = 

5° 

GCY6    -5    -4 
25  6f 25  b,25  4 

12.55 
11.42 

10.25 
10.01 

11.11 
9.96 

Unified 

cd  " 

2.5° 

GCV 
25~6,1~3,1~3 

13.93 

11.87 

12.13 
9.19 

14.31 
10.64 

Table  1C:   rms  errors  of  the  analyzed  values  for  GCV  and  non-GCV 
simulations  on  the  5x5  grid  with  4  observation  points.   Specified 
error  parameters  were  r   =  30,  rQ  =  10. 
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Grid    #Obs   cd    GCV2   GCV3   GCV4  NGCV2  NGCV3  NGCV4  01 

18.44  25.70  136.4  6.64  6.82  6.37  6.29 

6.94  6.60  7.79 

8.82   7.05   6.50  8.47  6.84  6.48  11.59 

8.43   6.15   5.95  7.09  5.94  5.75  30.00 

9.67   9.60   9.42  6.28  6.17  6.25  6.14 

10.83   8.09   7.91  8.58  7.43  7.07  10.66 

9.22   8.11   8.23  7.52  6.87  6.95  30.00 

13.94  12.55  10.80  8.76  8.64  8.63  8.22 

12.55  10.25  11.11  11.42  10.01  9.96  13.71 

13.93  12.13  14.31  11.87  9.19  10.64  23.36 


TABLE  2:   rms  errors  in  the  corrected  grid  values  obtained  with 
various  simulation  runs.   GCVm  denotes  GCV  was  used  to  estimate 
the  smoothing  parameter  for  the  Laplacian  smoothing  spline  of 
order  m.   NGCVm  denotes  GCV  was  not  used  with  the  Laplacian 
smoothing  spline  of  order  m.   01  denotes  the  error  estimate  from 
Optimum  Interpolation  for  the  corresponding  parameters.   Other 
parameters  used  were  rq  =  30,  rQ  =  10. 
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13  X  9  GRID,  36  OBS. 
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LSS,  CD  =  7.5 
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